Analysis of the Datasets and Strategy for AI-Powered Renewable Energy Consumption & Forecasting Dashboard¶
Objectives¶
Understand the Dataset: Analyze the provided datasets to assess their structure, completeness, and relevance to the project goals.
Explore Renewable Energy Consumption Trends: Identify historical patterns in renewable energy consumption at national, continental, and global levels.
Assess the Impact of Investments on Renewable Energy Growth: Examine how public and private investments correlate with the increase in renewable energy production.
Evaluate Policy and Macroeconomic Influences: Analyze how economic factors, government policies, and global trends impact renewable energy adoption.
Prepatre a aData set for Develop an AI-Powered Interactive Dashboard: Create a cleaned dataset for a web-based visualization platform to display energy consumption trends, forecasts, and policy insights.
Ensure Data Consistency and Integrity: Preprocess, clean, and integrate multiple datasets for accurate analysis and forecasting.
Questions:¶
Data Understanding & Preparation¶
What are the key variables in each dataset, and how do they relate to renewable energy consumption and forecasting?
Are there any missing values or inconsistencies that need to be addressed before analysis?
How can different datasets be merged effectively for better insights and predictions?
Trend Analysis & Forecasting¶
What are the historical trends in renewable energy consumption for different countries and continents?
How has the share of renewable energy changed over the past few decades?
Which forecasting model (ARIMA, Prophet, LSTM) provides the most accurate predictions for future renewable energy adoption?
What are the expected trends in renewable energy adoption for the next 5–10 years?
Investment & Economic Impact¶
How does public and private investment impact renewable energy production and adoption?
Is there a correlation between GDP growth and increased renewable energy consumption?
Which countries have successfully scaled their renewable energy infrastructure, and what investment patterns support this growth?
Policy & Regulatory Factors¶
What policies have been most effective in promoting renewable energy adoption?
How do government incentives and regulations affect renewable energy trends?
Are there any observable macroeconomic factors (e.g., inflation, energy prices) that influence renewable energy adoption?
Visualization & Dashboard Development¶
What types of visualizations (heatmaps, line charts, bar charts, dashboards) best represent energy consumption and forecasting insights?
How can an interactive dashboard help policymakers, businesses, and researchers make data-driven decisions?
What features should be included in the dashboard to allow users to explore data dynamically?
# the goal is the analyze the data and find the best way to predict the Strategy for AI-Powered Renewable Energy Consumption & Forecasting Dashboard
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
# loading the datasets
# owid data
owid_data = pd.read_csv('data/OWID/owid-energy-data.csv')
owid_bookcode = pd.read_csv('data/OWID/owid-energy-codebook.csv')
print(
owid_data.shape,
owid_bookcode.shape,
)
(21812, 130) (130, 4)
owid_data.head()
| country | year | iso_code | population | gdp | biofuel_cons_change_pct | biofuel_cons_change_twh | biofuel_cons_per_capita | biofuel_consumption | biofuel_elec_per_capita | ... | solar_share_elec | solar_share_energy | wind_cons_change_pct | wind_cons_change_twh | wind_consumption | wind_elec_per_capita | wind_electricity | wind_energy_per_capita | wind_share_elec | wind_share_energy | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ASEAN (Ember) | 2000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | NaN | 0.0 | NaN |
| 1 | ASEAN (Ember) | 2001 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | NaN | 0.0 | NaN |
| 2 | ASEAN (Ember) | 2002 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | NaN | 0.0 | NaN |
| 3 | ASEAN (Ember) | 2003 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | NaN | 0.0 | NaN |
| 4 | ASEAN (Ember) | 2004 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | NaN | 0.0 | NaN |
5 rows × 130 columns
owid_data.sample(10)
| country | year | iso_code | population | gdp | biofuel_cons_change_pct | biofuel_cons_change_twh | biofuel_cons_per_capita | biofuel_consumption | biofuel_elec_per_capita | ... | solar_share_elec | solar_share_energy | wind_cons_change_pct | wind_cons_change_twh | wind_consumption | wind_elec_per_capita | wind_electricity | wind_energy_per_capita | wind_share_elec | wind_share_energy | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9642 | Iraq | 1965 | IRQ | 8270548.0 | 4.177468e+10 | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 18402 | Suriname | 2001 | SUR | 487402.0 | NaN | NaN | NaN | NaN | NaN | 0.000 | ... | 0.000 | NaN | NaN | NaN | NaN | 0.000 | 0.000 | NaN | 0.000 | NaN |
| 11548 | Malaysia | 1945 | MYS | 5454871.0 | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1691 | Australia | 2021 | AUS | 25921094.0 | 1.296705e+12 | 3.015 | 0.036 | 47.804 | 1.239 | 128.853 | ... | 11.662 | 5.137 | 18.530 | 10.744 | 70.116 | 1033.907 | 26.800 | 2704.984 | 10.021 | 4.412 |
| 16787 | Saint Helena | 2007 | SHN | 5338.0 | NaN | NaN | NaN | NaN | NaN | 0.000 | ... | 0.000 | NaN | NaN | NaN | NaN | 0.000 | 0.000 | NaN | 0.000 | NaN |
| 7969 | Ghana | 1910 | GHA | 2945929.0 | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10797 | Liberia | 1997 | LBR | 2383302.0 | 1.626037e+09 | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 6862 | Europe (EI) | 2002 | NaN | NaN | NaN | 28.315 | 2.360 | NaN | 10.694 | NaN | ... | 0.008 | 0.004 | 35.482 | 27.374 | 106.470 | NaN | 36.477 | NaN | 0.981 | 0.432 |
| 3983 | Central & South America (EIA) | 2010 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 20052 | Ukraine | 1950 | UKR | 37303000.0 | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
10 rows × 130 columns
2. Global Energy Consumption & Renewable Generation || Kaggle¶
# global energy data
continent_consumption = pd.read_csv("data/global_energy/Continent_Consumption_TWH.csv")
country_consumption = pd.read_csv("data/global_energy/Country_Consumption_TWH.csv")
non_renewable_total_power_generation = pd.read_csv(
"data/global_energy/nonRenewablesTotalPowerGeneration.csv"
)
renewable_power_generation_97_17 = pd.read_csv(
"data/global_energy/renewablePowerGeneration97-17.csv"
)
renewable_total_power_generation = pd.read_csv(
"data/global_energy/renewablesTotalPowerGeneration.csv"
)
top_20_countries_power_generatoin = pd.read_csv(
"data/global_energy/top20CountriesPowerGeneration.csv"
)
# print the shapes of the datasets
print(
continent_consumption.shape,
country_consumption.shape,
non_renewable_total_power_generation.shape,
renewable_power_generation_97_17.shape,
renewable_total_power_generation.shape,
top_20_countries_power_generatoin.shape,
)
(31, 12) (31, 45) (8, 2) (28, 5) (9, 2) (20, 6)
continent_consumption.head()
| Year | World | OECD | BRICS | Europe | North America | Latin America | Asia | Pacific | Africa | Middle-East | CIS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1990 | 101855.54 | 52602.49 | 26621.07 | 20654.88 | 24667.23 | 5373.06 | 24574.19 | 1197.89 | 4407.77 | 2581.86 | 16049.40 |
| 1 | 1991 | 102483.56 | 53207.25 | 26434.99 | 20631.62 | 24841.68 | 5500.99 | 24783.53 | 1186.26 | 4535.70 | 2744.68 | 15898.21 |
| 2 | 1992 | 102588.23 | 53788.75 | 25993.05 | 20189.68 | 25341.77 | 5628.92 | 25690.67 | 1209.52 | 4582.22 | 3081.95 | 14339.79 |
| 3 | 1993 | 103646.56 | 54614.48 | 26283.80 | 20189.68 | 25830.23 | 5675.44 | 26876.93 | 1267.67 | 4721.78 | 3349.44 | 13246.57 |
| 4 | 1994 | 104449.03 | 55579.77 | 25993.05 | 20085.01 | 26365.21 | 5989.45 | 28098.08 | 1279.30 | 4803.19 | 3640.19 | 11606.74 |
country_consumption.head()
| Year | China | United States | Brazil | Belgium | Czechia | France | Germany | Italy | Netherlands | ... | Australia | New Zealand | Algeria | Egypt | Nigeria | South Africa | Iran | Kuwait | Saudi Arabia | United Arab Emirates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1990 | 874 | 1910 | 141 | 48 | 50 | 225 | 351 | 147 | 67 | ... | 86 | 14 | 22 | 33 | 66 | 90 | 69 | 9 | 58 | 20 |
| 1 | 1991 | 848 | 1925 | 143 | 50 | 45 | 237 | 344 | 150 | 69 | ... | 85 | 14 | 23 | 33 | 70 | 92 | 77 | 3 | 68 | 23 |
| 2 | 1992 | 877 | 1964 | 145 | 51 | 44 | 234 | 338 | 149 | 69 | ... | 87 | 14 | 24 | 34 | 72 | 88 | 81 | 9 | 77 | 22 |
| 3 | 1993 | 929 | 1998 | 148 | 49 | 43 | 238 | 335 | 149 | 70 | ... | 91 | 15 | 24 | 35 | 74 | 94 | 87 | 12 | 80 | 23 |
| 4 | 1994 | 973 | 2036 | 156 | 52 | 41 | 231 | 333 | 147 | 70 | ... | 91 | 15 | 23 | 34 | 72 | 98 | 97 | 14 | 84 | 26 |
5 rows × 45 columns
country_consumption.head()
| Year | China | United States | Brazil | Belgium | Czechia | France | Germany | Italy | Netherlands | ... | Australia | New Zealand | Algeria | Egypt | Nigeria | South Africa | Iran | Kuwait | Saudi Arabia | United Arab Emirates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1990 | 874 | 1910 | 141 | 48 | 50 | 225 | 351 | 147 | 67 | ... | 86 | 14 | 22 | 33 | 66 | 90 | 69 | 9 | 58 | 20 |
| 1 | 1991 | 848 | 1925 | 143 | 50 | 45 | 237 | 344 | 150 | 69 | ... | 85 | 14 | 23 | 33 | 70 | 92 | 77 | 3 | 68 | 23 |
| 2 | 1992 | 877 | 1964 | 145 | 51 | 44 | 234 | 338 | 149 | 69 | ... | 87 | 14 | 24 | 34 | 72 | 88 | 81 | 9 | 77 | 22 |
| 3 | 1993 | 929 | 1998 | 148 | 49 | 43 | 238 | 335 | 149 | 70 | ... | 91 | 15 | 24 | 35 | 74 | 94 | 87 | 12 | 80 | 23 |
| 4 | 1994 | 973 | 2036 | 156 | 52 | 41 | 231 | 333 | 147 | 70 | ... | 91 | 15 | 23 | 34 | 72 | 98 | 97 | 14 | 84 | 26 |
5 rows × 45 columns
non_renewable_total_power_generation.head()
| Mode of Generation | Contribution (TWh) | |
|---|---|---|
| 0 | Coal | 9863.33 |
| 1 | Natural Gas | 5882.82 |
| 2 | Nuclear | 2636.03 |
| 3 | Oil | 841.87 |
| 4 | Waste | 114.04 |
renewable_power_generation_97_17.head()
| Year | Hydro(TWh) | Biofuel(TWh) | Solar PV (TWh) | Geothermal (TWh) | |
|---|---|---|---|---|---|
| 0 | 1990 | 2191.67 | 3.88 | 0.09 | 36.42 |
| 1 | 1991 | 2268.63 | 4.19 | 0.10 | 37.39 |
| 2 | 1992 | 2267.16 | 4.63 | 0.12 | 39.30 |
| 3 | 1993 | 2397.67 | 5.61 | 0.15 | 40.23 |
| 4 | 1994 | 2419.73 | 7.31 | 0.17 | 41.05 |
renewable_total_power_generation.head()
| Mode of Generation | Contribution (TWh) | |
|---|---|---|
| 0 | Hydro | 9863.33 |
| 1 | Wind | 5882.82 |
| 2 | Biofuel | 2636.03 |
| 3 | Solar PV | 841.87 |
| 4 | Geothermal | 114.04 |
top_20_countries_power_generatoin.head()
| Country | Hydro(TWh) | Biofuel(TWh) | Solar PV (TWh) | Geothermal (TWh) | Total (TWh) | |
|---|---|---|---|---|---|---|
| 0 | China | 1189.84 | 295.02 | 79.43 | 0.125 | 1819.940 |
| 1 | USA | 315.62 | 277.91 | 58.95 | 18.960 | 758.619 |
| 2 | Brazil | 370.90 | 42.37 | 52.25 | 0.000 | 466.350 |
| 3 | Canada | 383.48 | 29.65 | 7.12 | 0.000 | 424.090 |
| 4 | India | 141.80 | 51.06 | 43.76 | 0.000 | 262.650 |
Energy Generation & Consumption (from multiple sources)¶
electricity_consumption_statistics = pd.read_csv('data/IRR_cleaned/ELECSTAT_CLEANED.csv')
heat_generations = pd.read_csv('data/IRR_cleaned/HEATGEN_CLEANED.csv')
share_of_renewables = pd.read_csv('data/IRR_cleaned/RESHARE_CLEANED.csv')
investment_in_energy_infrastructure = pd.read_csv('data/IRR_cleaned/PUBFIN_CLEANED.csv')
print(
electricity_consumption_statistics.shape,
heat_generations.shape,
share_of_renewables.shape,
investment_in_energy_infrastructure.shape,
)
(9648, 7) (9254, 5) (10123, 4) (197064, 4)
electricity_consumption_statistics.head()
| Region_Tech_Desc | Category | Data_Type | Year | Electricity_Output_GWh | Grid_Connection | Miscellaneous | |
|---|---|---|---|---|---|---|---|
| 0 | World | Total renewable | Electricity Generation (GWh) | 2000.0 | 2846212 | NaN | NaN |
| 1 | World | Total renewable | Electricity Installed Capacity (MW) | 2000.0 | 752238.54 | NaN | NaN |
| 2 | World | Solar energy | Electricity Generation (GWh) | 2000.0 | 1312.13 | NaN | NaN |
| 3 | World | Solar energy | Electricity Installed Capacity (MW) | 2000.0 | 1215.68 | NaN | NaN |
| 4 | World | Wind energy | Electricity Generation (GWh) | 2000.0 | 30944.47 | NaN | NaN |
heat_generations.head()
| Country/Area | Technology | Grid Connection | Year | Heat Generation (TJ) | |
|---|---|---|---|---|---|
| 0 | Albania | Coal and peat | Heat (Commercial) | 2000 | 84.0 |
| 1 | Albania | Coal and peat | Heat (Commercial) | 2001 | 109.0 |
| 2 | Albania | Coal and peat | Heat (Commercial) | 2002 | 77.0 |
| 3 | Albania | Oil | Combined Heat and Power (CHP) | 2000 | 145.0 |
| 4 | Albania | Oil | Combined Heat and Power (CHP) | 2001 | 138.0 |
share_of_renewables.head()
| Region/Country | Indicator | Year | Value | |
|---|---|---|---|---|
| 0 | World | RE share of electricity generation (%) | 2000 | 18.31 |
| 1 | World | NaN | 2001 | 17.82 |
| 2 | World | NaN | 2002 | 17.76 |
| 3 | World | NaN | 2003 | 17.29 |
| 4 | World | NaN | 2004 | 17.73 |
investment_in_energy_infrastructure.head()
| Country | Technology | Year | Investment | |
|---|---|---|---|---|
| 0 | Afghanistan | On-grid Solar photovoltaic | 2022 | 0.00 |
| 1 | Afghanistan | On-grid Solar photovoltaic | 2021 | 0.00 |
| 2 | Afghanistan | On-grid Solar photovoltaic | 2020 | 0.00 |
| 3 | Afghanistan | On-grid Solar photovoltaic | 2019 | 4.38 |
| 4 | Afghanistan | On-grid Solar photovoltaic | 2018 | 48.17 |
1. Understanding the Datasets¶
After reviewing the three provided datasets, we can categorize them as follows:
1. Dataset Group 1: Our World in Data (OWID) - Global Energy Dataset¶
Section Summary:
owid-energy-data.csv: A dataset covering global energy production, electricity mix, and energy consumption trends from various sources (hydro, wind, solar, fossil fuels, etc.).
owid-energy-codebook.csv: A codebook detailing column descriptions and data sources for the OWID dataset.
owid_data.columns
Index(['country', 'year', 'iso_code', 'population', 'gdp',
'biofuel_cons_change_pct', 'biofuel_cons_change_twh',
'biofuel_cons_per_capita', 'biofuel_consumption',
'biofuel_elec_per_capita',
...
'solar_share_elec', 'solar_share_energy', 'wind_cons_change_pct',
'wind_cons_change_twh', 'wind_consumption', 'wind_elec_per_capita',
'wind_electricity', 'wind_energy_per_capita', 'wind_share_elec',
'wind_share_energy'],
dtype='object', length=130)
owid_bookcode.columns
Index(['column', 'description', 'unit', 'source'], dtype='object')
# save the output on a text file
owid_data.describe().to_csv('outputs/exploring_outputs/owid/owid_data_describe.csv')
owid_data.describe()
| year | population | gdp | biofuel_cons_change_pct | biofuel_cons_change_twh | biofuel_cons_per_capita | biofuel_consumption | biofuel_elec_per_capita | biofuel_electricity | biofuel_share_elec | ... | solar_share_elec | solar_share_energy | wind_cons_change_pct | wind_cons_change_twh | wind_consumption | wind_elec_per_capita | wind_electricity | wind_energy_per_capita | wind_share_elec | wind_share_energy | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 21812.000000 | 1.844700e+04 | 1.177500e+04 | 1806.000000 | 2796.000000 | 2400.000000 | 2876.000000 | 5570.000000 | 5834.000000 | 5811.000000 | ... | 7183.000000 | 4127.000000 | 2508.000000 | 4740.000000 | 4845.000000 | 7278.000000 | 8234.000000 | 4153.000000 | 7197.000000 | 4193.000000 |
| mean | 1974.195718 | 1.054051e+08 | 4.260596e+11 | 45.489759 | 2.867027 | 136.600523 | 39.082519 | 66.750870 | 11.262839 | 1.865575 | ... | 0.772870 | 0.258574 | 306.388427 | 7.886606 | 60.060250 | 67.711201 | 18.936078 | 254.848409 | 1.491997 | 0.717834 |
| std | 35.342860 | 4.665375e+08 | 3.508591e+12 | 266.131064 | 10.692769 | 261.757657 | 116.307666 | 197.542142 | 49.585627 | 5.372183 | ... | 2.495118 | 0.773014 | 6297.500646 | 39.805408 | 306.013269 | 267.759489 | 110.680131 | 791.447343 | 4.688010 | 2.018520 |
| min | 1900.000000 | 1.833000e+03 | 1.642060e+08 | -100.000000 | -49.355000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | -100.000000 | -50.829000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 1946.000000 | 1.714291e+06 | 1.438637e+10 | -0.500000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 3.921750 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 50% | 1984.000000 | 6.998022e+06 | 4.393385e+10 | 8.189000 | 0.000000 | 14.642500 | 0.694500 | 0.491000 | 0.010000 | 0.096000 | ... | 0.000000 | 0.000000 | 19.564500 | 0.000000 | 0.008000 | 0.000000 | 0.000000 | 0.084000 | 0.000000 | 0.004000 |
| 75% | 2004.000000 | 2.571993e+07 | 1.830838e+11 | 26.550000 | 0.832000 | 175.006500 | 12.811500 | 38.162750 | 0.720000 | 1.526500 | ... | 0.138500 | 0.026000 | 46.448500 | 0.359250 | 2.782000 | 3.080750 | 0.150000 | 56.324000 | 0.419000 | 0.320000 |
| max | 2023.000000 | 8.045311e+09 | 1.301126e+14 | 5659.328000 | 144.146000 | 2588.512000 | 1317.625000 | 2453.043000 | 678.740000 | 77.586000 | ... | 50.000000 | 9.368000 | 242384.844000 | 677.824000 | 6040.359000 | 3283.758000 | 2304.440000 | 8523.436000 | 57.716000 | 25.767000 |
8 rows × 128 columns
The bookcode provides metadata about the cols, we can benefit from that by visually navigate through the csv.
After Reviewing the bookcode CSV, I can benefit from the "column" and the "units" columns through my programatically exploring into this dataset.
owid_bookcode = owid_bookcode[['column', 'unit']]
owid_bookcode.dropna(inplace=True)
owid_bookcode.head()
| column | unit | |
|---|---|---|
| 3 | population | persons |
| 4 | gdp | international-$ in 2011 prices |
| 5 | biofuel_cons_change_pct | % |
| 6 | biofuel_cons_change_twh | terawatt-hours |
| 7 | biofuel_cons_per_capita | kilowatt-hours |
owid_info_df = pd.DataFrame(owid_data.dtypes, columns=['data_type']).reset_index()
owid_info_df.to_csv('outputs/exploring_outputs/owid/owid_info.csv')
# owid_info_df['missing_values'] = owid_data.isnull().sum()
owid_data.value_counts().to_csv('outputs/exploring_outputs/owid/owid_data_value_counts.csv')
missing_df = owid_data.isnull().sum().reset_index()
missing_df.columns = ['column', 'missing_values']
missing_df['total_values'] = owid_data.shape[0]
missing_df['missing_percentage'] = missing_df['missing_values'] / missing_df['total_values'] * 100
missing_df.sort_values('missing_percentage', ascending=False, inplace=True)
missing_df.to_csv('outputs/exploring_outputs/owid/owid_data_missing.csv')
def basic_info(df, name):
print(f"\n{name} Dataset Info:")
print(df.info())
print("\nMissing values:")
print(df.isnull().sum())
print("\nSummary Statistics:")
print(df.describe(include='all'))
basic_info(owid_data, "Energy Data")
Energy Data Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21812 entries, 0 to 21811
Columns: 130 entries, country to wind_share_energy
dtypes: float64(127), int64(1), object(2)
memory usage: 21.6+ MB
None
Missing values:
country 0
year 0
iso_code 5000
population 3365
gdp 10037
...
wind_elec_per_capita 14534
wind_electricity 13578
wind_energy_per_capita 17659
wind_share_elec 14615
wind_share_energy 17619
Length: 130, dtype: int64
Summary Statistics:
country year iso_code population gdp \
count 21812 21812.000000 16812 1.844700e+04 1.177500e+04
unique 294 NaN 220 NaN NaN
top Zimbabwe NaN DZA NaN NaN
freq 124 NaN 124 NaN NaN
mean NaN 1974.195718 NaN 1.054051e+08 4.260596e+11
std NaN 35.342860 NaN 4.665375e+08 3.508591e+12
min NaN 1900.000000 NaN 1.833000e+03 1.642060e+08
25% NaN 1946.000000 NaN 1.714291e+06 1.438637e+10
50% NaN 1984.000000 NaN 6.998022e+06 4.393385e+10
75% NaN 2004.000000 NaN 2.571993e+07 1.830838e+11
max NaN 2023.000000 NaN 8.045311e+09 1.301126e+14
biofuel_cons_change_pct biofuel_cons_change_twh \
count 1806.000000 2796.000000
unique NaN NaN
top NaN NaN
freq NaN NaN
mean 45.489759 2.867027
std 266.131064 10.692769
min -100.000000 -49.355000
25% -0.500000 0.000000
50% 8.189000 0.000000
75% 26.550000 0.832000
max 5659.328000 144.146000
biofuel_cons_per_capita biofuel_consumption biofuel_elec_per_capita \
count 2400.000000 2876.000000 5570.000000
unique NaN NaN NaN
top NaN NaN NaN
freq NaN NaN NaN
mean 136.600523 39.082519 66.750870
std 261.757657 116.307666 197.542142
min 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000
50% 14.642500 0.694500 0.491000
75% 175.006500 12.811500 38.162750
max 2588.512000 1317.625000 2453.043000
... solar_share_elec solar_share_energy wind_cons_change_pct \
count ... 7183.000000 4127.000000 2508.000000
unique ... NaN NaN NaN
top ... NaN NaN NaN
freq ... NaN NaN NaN
mean ... 0.772870 0.258574 306.388427
std ... 2.495118 0.773014 6297.500646
min ... 0.000000 0.000000 -100.000000
25% ... 0.000000 0.000000 3.921750
50% ... 0.000000 0.000000 19.564500
75% ... 0.138500 0.026000 46.448500
max ... 50.000000 9.368000 242384.844000
wind_cons_change_twh wind_consumption wind_elec_per_capita \
count 4740.000000 4845.000000 7278.000000
unique NaN NaN NaN
top NaN NaN NaN
freq NaN NaN NaN
mean 7.886606 60.060250 67.711201
std 39.805408 306.013269 267.759489
min -50.829000 0.000000 0.000000
25% 0.000000 0.000000 0.000000
50% 0.000000 0.008000 0.000000
75% 0.359250 2.782000 3.080750
max 677.824000 6040.359000 3283.758000
wind_electricity wind_energy_per_capita wind_share_elec \
count 8234.000000 4153.000000 7197.000000
unique NaN NaN NaN
top NaN NaN NaN
freq NaN NaN NaN
mean 18.936078 254.848409 1.491997
std 110.680131 791.447343 4.688010
min 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000
50% 0.000000 0.084000 0.000000
75% 0.150000 56.324000 0.419000
max 2304.440000 8523.436000 57.716000
wind_share_energy
count 4193.000000
unique NaN
top NaN
freq NaN
mean 0.717834
std 2.018520
min 0.000000
25% 0.000000
50% 0.004000
75% 0.320000
max 25.767000
[11 rows x 130 columns]
display(owid_data.head())
| country | year | iso_code | population | gdp | biofuel_cons_change_pct | biofuel_cons_change_twh | biofuel_cons_per_capita | biofuel_consumption | biofuel_elec_per_capita | ... | solar_share_elec | solar_share_energy | wind_cons_change_pct | wind_cons_change_twh | wind_consumption | wind_elec_per_capita | wind_electricity | wind_energy_per_capita | wind_share_elec | wind_share_energy | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ASEAN (Ember) | 2000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | NaN | 0.0 | NaN |
| 1 | ASEAN (Ember) | 2001 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | NaN | 0.0 | NaN |
| 2 | ASEAN (Ember) | 2002 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | NaN | 0.0 | NaN |
| 3 | ASEAN (Ember) | 2003 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | NaN | 0.0 | NaN |
| 4 | ASEAN (Ember) | 2004 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | NaN | 0.0 | NaN |
5 rows × 130 columns
def plot_energy_trends(df, energy_source):
"""Plot trends in energy production/consumption over time."""
plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x="year", y=energy_source, hue="country", legend=False)
plt.title(f"Trends in {energy_source}")
plt.xlabel("Year")
plt.ylabel(energy_source)
plt.savefig(f"outputs/exploring_outputs/owid/figures/{energy_source}_trends.png")
def plot_energy_correlation(df):
"""Plot correlation matrix of energy sources."""
plt.figure(figsize=(12, 6))
sns.heatmap(df.corr(), annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Correlation Matrix of Energy Sources")
plt.show()
plt.savefig(f"outputs/exploring_outputs/owid/figures/energy_correlation.png")
def plot_energy_sources(df):
"""Plot energy sources."""
plt.figure(figsize=(12, 6))
sns.barplot(data=df, x="value", y="source", palette="viridis")
plt.title("Energy Sources")
plt.xlabel("Value")
plt.ylabel("Source")
plt.show()
plt.savefig(f"outputs/exploring_outputs/owid/figures/energy_sources.png")
def plot_energy_sources_by_country(df, country):
"""Plot energy sources by country."""
plt.figure(figsize=(12, 6))
sns.barplot(
data=df[df["country"] == country], x="value", y="source", palette="viridis"
)
plt.title(f"Energy Sources in {country}")
plt.xlabel("Value")
plt.ylabel("Source")
plt.show()
plt.savefig(f"outputs/exploring_outputs/owid/figures/{country}_energy_sources.png")
def plot_energy_sources_by_year(df, year):
"""Plot energy sources by year."""
plt.figure(figsize=(12, 6))
sns.barplot(data=df[df["year"] == year], x="value", y="source", palette="viridis")
plt.title(f"Energy Sources in {year}")
plt.xlabel("Value")
plt.ylabel("Source")
plt.show()
plt.savefig(f"outputs/exploring_outputs/owid/figures/{year}_energy_sources.png")
def plot_energy_sources_by_continent(df, continent):
"""Plot energy sources by continent."""
plt.figure(figsize=(12, 6))
sns.barplot(
data=df[df["continent"] == continent], x="value", y="source", palette="viridis"
)
plt.title(f"Energy Sources in {continent}")
plt.xlabel("Value")
plt.ylabel("Source")
plt.show()
plt.savefig(
f"outputs/exploring_outputs/owid/figures/{continent}_energy_sources.png"
)
def plot_energy_sources_by_region(df, region):
"""Plot energy sources by region."""
plt.figure(figsize=(12, 6))
sns.barplot(
data=df[df["region"] == region], x="value", y="source", palette="viridis"
)
plt.title(f"Energy Sources in {region}")
plt.xlabel("Value")
plt.ylabel("Source")
plt.show()
plt.savefig(f"outputs/exploring_outputs/owid/figures/{region}_energy_sources.png")
def plot_energy_sources_by_income_group(df, income_group):
"""Plot energy sources by income group."""
plt.figure(figsize=(12, 6))
sns.barplot(
data=df[df["income_group"] == income_group],
x="value",
y="source",
palette="viridis",
)
plt.title(f"Energy Sources in {income_group}")
plt.xlabel("Value")
plt.ylabel("Source")
plt.show()
plt.savefig(
f"outputs/exploring_outputs/owid/figures/{income_group}_energy_sources.png"
)
energy_columns = [col for col in owid_data.columns if any(x in col for x in ['energy', 'electricity', 'consumption', 'production'])]
print("Available energy-related columns:")
print(energy_columns)
Available energy-related columns: ['biofuel_consumption', 'biofuel_electricity', 'biofuel_share_energy', 'coal_consumption', 'coal_electricity', 'coal_production', 'coal_share_energy', 'electricity_demand', 'electricity_demand_per_capita', 'electricity_generation', 'electricity_share_energy', 'energy_cons_change_pct', 'energy_cons_change_twh', 'energy_per_capita', 'energy_per_gdp', 'fossil_electricity', 'fossil_energy_per_capita', 'fossil_fuel_consumption', 'fossil_share_energy', 'gas_consumption', 'gas_electricity', 'gas_energy_per_capita', 'gas_production', 'gas_share_energy', 'hydro_consumption', 'hydro_electricity', 'hydro_energy_per_capita', 'hydro_share_energy', 'low_carbon_consumption', 'low_carbon_electricity', 'low_carbon_energy_per_capita', 'low_carbon_share_energy', 'nuclear_consumption', 'nuclear_electricity', 'nuclear_energy_per_capita', 'nuclear_share_energy', 'oil_consumption', 'oil_electricity', 'oil_energy_per_capita', 'oil_production', 'oil_share_energy', 'other_renewable_consumption', 'other_renewable_electricity', 'other_renewable_exc_biofuel_electricity', 'other_renewables_energy_per_capita', 'other_renewables_share_energy', 'per_capita_electricity', 'primary_energy_consumption', 'renewables_consumption', 'renewables_electricity', 'renewables_energy_per_capita', 'renewables_share_energy', 'solar_consumption', 'solar_electricity', 'solar_energy_per_capita', 'solar_share_energy', 'wind_consumption', 'wind_electricity', 'wind_energy_per_capita', 'wind_share_energy']
for col in energy_columns:
if owid_data[col].dtype in ['float64', 'int64']:
plot_energy_trends(owid_data, col)
C:\Users\obada\AppData\Local\Temp\ipykernel_22452\3194003883.py:3: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). Consider using `matplotlib.pyplot.close()`. plt.figure(figsize=(12, 6))
def plot_energy_distribution(df, energy_source):
"""Plot distribution of energy production/consumption."""
plt.figure(figsize=(12, 6))
sns.histplot(df[energy_source], bins=30, kde=True)
plt.title(f"Distribution of {energy_source}")
plt.xlabel(energy_source)
plt.ylabel("Frequency")
plt.savefig(
f"outputs/exploring_outputs/owid/figures/{energy_source}_distribution.png"
)
plot_energy_distribution(owid_data, "renewables_consumption")
2. Dataset Group 2: Global Energy Consumption & Renewable Generation¶
renewablePowerGeneration97-17.csv: Tracks renewable power generation trends from 1997 to 2017 across different energy types (hydro, wind, biofuel, solar, geothermal).
renewablesTotalPowerGeneration.csv: Summarizes total renewable power generation in TWh globally.
nonRenewablesTotalPowerGeneration.csv: Summarizes total non-renewable power generation in TWh globally.
top20CountriesPowerGeneration.csv: Highlights the top 20 countries' renewable energy generation.
Country_Consumption_TWH.csv: Records national energy consumption trends.
Continent_Consumption_TWH.csv: Records energy consumption trends at a continental level.
Dataset Group 3: Energy Generation & Consumption (from multiple sources)¶
HEATGEN_CLEANED.csv: Contains cleaned data on heat generation. This dataset could be useful for understanding how different energy sources contribute to overall energy production.
ELECSTAT_CLEANED.csv: Likely includes statistics on electricity consumption, possibly broken down by country and year.
RESHARE_CLEANED.csv: Appears to contain information on the share of renewable energy sources in overall energy consumption.
PUBFIN_CLEANED.csv: May provide financial data related to public investment in energy infrastructure.
2. Data Suitability for Forecasting and Visualization¶
Based on the project requirements, we can determine how each dataset fits into the forecasting and visualization strategy:
A. Forecasting Future Renewable Energy Trends (5-10 years)¶
Store the jupyter notebook¶
!jupyter nbconvert --to script "data_inspection.ipynb" --output-dir="outputs/scripts"
!jupyter nbconvert --to html "data_inspection.ipynb" --output-dir="outputs/html"
C:\Users\obada\AppData\Local\Programs\Python\Python313\Scripts\jupyter-nbconvert.EXE\__main__.py:4: DeprecationWarning: Parsing dates involving a day of month without a year specified is ambiguious and fails to parse leap day. The default behavior will change in Python 3.15 to either always raise an exception or to use a different default year (TBD). To avoid trouble, add a specific year to the input & format. See https://github.com/python/cpython/issues/70647. [NbConvertApp] Converting notebook data_inspection.ipynb to script [NbConvertApp] Writing 14173 bytes to outputs\scripts\data_inspection.py